Hashing with Generalized Nyström Approximation

نویسندگان

  • Jeong-Min Yun
  • Saehoon Kim
  • Seungjin Choi
چکیده

Hashing, which involves learning binary codes to embed high-dimensional data into a similarity-preserving low-dimensional Hamming space, is often formulated as linear dimensionality reduction followed by binary quantization. Linear dimensionality reduction, based on maximum variance formulation, requires leading eigenvectors of data covariance or graph Laplacian matrix. Computing leading singular vectors or eigenvectors in the case of high-dimension and large sample size, is a main bottleneck in most of data-driven hashing methods. In this paper we address the use of generalized Nyström method where a subset of rows and columns are used to approximately compute leading singular vectors of the data matrix, in order to improve the scalability of hashing methods in the case of high-dimensional data with large sample size. Especially we validate the useful behavior of generalized Nyström approximation with uniform sampling, in the case of a recentlydeveloped hashing method based on principal component analysis (PCA) followed by an iterative quantization, referred to as PCA+ITQ, developed by Gong and Lazebnik. We compare the performance of generalized Nyström approximation with uniform and non-uniform sampling, to the full singular value decomposition (SVD) method, confirming that the uniform sampling improves the computational and space complexities dramatically, while the performance is not much sacrificed. In addition we present low-rank approximation error bounds for generalized Nyström approximation with uniform sampling, which is not a trivial extension of available results on the nonuniform sampling case. Keywords-CUR decomposition; hashing; generalized Nyström approximation; pseudoskeleton approximation; uniform sampling;

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Randomized Clustered Nystrom for Large-Scale Kernel Machines

The Nyström method has been popular for generating the low-rank approximation of kernel matrices that arise in many machine learning problems. The approximation quality of the Nyström method depends crucially on the number of selected landmark points and the selection procedure. In this paper, we present a novel algorithm to compute the optimal Nyström low-approximation when the number of landm...

متن کامل

Double Nyström Method: An Efficient and Accurate Nyström Scheme for Large-Scale Data Sets

The Nyström method has been one of the most effective techniques for kernel-based approach that scales well to large data sets. Since its introduction, there has been a large body of work that improves the approximation accuracy while maintaining computational efficiency. In this paper, we present a novel Nyström method that improves both accuracy and efficiency based on a new theoretical analy...

متن کامل

Generalized Intersection Kernel

Following the very recent line of work on the “generalized min-max” (GMM) kernel [7], this study proposes the “generalized intersection” (GInt) kernel and the related “normalized generalized min-max” (NGMM) kernel. In computer vision, the (histogram) intersection kernel has been popular, and the GInt kernel generalizes it to data which can have both negative and positive entries. Through an ext...

متن کامل

Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling

The CUR matrix decomposition and the Nyström approximation are two important lowrank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, CUR decomposition can be regarded as an extension of the Nyström a...

متن کامل

Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds

Kernel k-means clustering can correctly identify and extract a far more varied collection of cluster structures than the linear k-means clustering algorithm. However, kernel kmeans clustering is computationally expensive when the non-linear feature map is highdimensional and there are many input points. Kernel approximation, e.g., the Nyström method, has been applied in previous works to approx...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012